An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition
نویسندگان
چکیده
Several approaches have been described for the automatic unsupervised acquisition of patterns for information extraction. Each approach is based on a particular model for the patterns to be acquired, such as a predicate-argument structure or a dependency chain. The effect of these alternative models has not been previously studied. In this paper, we compare the prior models and introduce a new model, the Subtree model, based on arbitrary subtrees of dependency trees. We describe a discovery procedure for this model and demonstrate experimentally an improvement in recall using Subtree patterns.
منابع مشابه
Comparing Information Extraction Pattern Models
Several recently reported techniques for the automatic acquisition of Information Extraction (IE) systems have used dependency trees as the basis of their extraction pattern representation. These approaches have used a variety of pattern models (schemes for representing IE patterns based on particular parts of the dependency analysis). An appropriate model should be expressive enough to represe...
متن کاملJapanese Information Extraction with Automatically Extracted Patterns
One of the central issues for information extraction (IE) systems is the cost of customization from one scenario to another. Research on the automated acquisition of patterns is important for portability and scalability. This paper explores the automatic extraction of patterns in Japanese from unannotated text. We introduce two modules of our system, the pattern extraction module and the inform...
متن کاملA Task-based Comparison of Information Extraction Pattern Models
Several recent approaches to Information Extraction (IE) have used dependency trees as the basis for an extraction pattern representation. These approaches have used a variety of pattern models (schemes which define the parts of the dependency tree which can be used to form extraction patterns). Previous comparisons of these pattern models are limited by the fact that they have used indirect ta...
متن کاملAutomatic Discovery of Linguistic Patterns for Information Extraction
Information Extraction (IE) systems typically rely on extraction patterns encoding domain-specific knowledge. When matched against natural language texts, these patterns recognize with high accuracy information relevant to the extraction task. Adapting an IE system to a new extraction scenario entails devising a new collection of extraction patterns a time-consuming and expensive process. To ov...
متن کاملOn the Expressiveness of Information Extraction Patterns
Many recently reported machine learning approaches to the acquisition of information extraction (IE) patterns have used dependency trees as the basis for their pattern representations (Yangarber et al., 2000a; Yangarber, 2003; Sudo et al., 2003; Stevenson and Greenwood, 2005). While varying results have been reported for the resulting IE systems little has been reported about the ability of dep...
متن کامل